Conversation
5c0cd25 to
d3505c6
Compare
|
This is now consistent, with |
since GMX_LOG is rank 0 only, and cerr is not allowed as per GROMACS style guides
1. The bonded interaction building (make_bondeds_zone) runs for all zones but won't find any cross-zone bondeds when !hasInterAtomicInteractions() — it's a no-op for the extra zones 2. The exclusion building (make_exclusions_zone) correctly builds exclusion entries for all i-zone atoms, satisfying the pairlist assertion Added nzone_bondeds = std::max(nzone_bondeds, numIZonesForExclusions) to ensure the exclusion building loop covers all i-zones when intermolecularExclusionGroup is present. Without this, 3D DD (e.g., 2x2x2 with 8 ranks) has numIZones=4 but nzone_bondeds=1, so exclusion lists are only built for zone 0 atoms while the nbnxm assertion expects them for zones 0-3.
1. localToModelIndex_ sized to numLocalPlusHalo instead of signal.x_.size() (was OOB write) 2. augmentGhostPairs rewritten to correctly identify halo MTA atoms by iterating localToModelIndex_ from numLocalAtoms_ onward, instead of incorrectly slicing the full coordinate array 3. Shift vector computed as pair.dx() - (positions_[B] - positions_[A]) in model-space, then rounded to integer cell shifts and recomputed from box vectors for consistency 4. Deduplication of pairs using std::set<tuple> to handle overlap between signal pairs and augmented halo-halo pairs 5. Timer instrumentation via MetatomicTimer RAII class around key phases
| // TODO: For multi-layer GNN models (MACE, NequIP, etc.) the | ||
| // interaction_range should be n_layers * cutoff so that DD halos | ||
| // are deep enough for message-passing. Many models currently |
There was a problem hiding this comment.
I think there is nothing we can do here. The model has to declare the correct interaction_range property:
We also say that this has to include the message passing layers.
There was a problem hiding this comment.
Makes sense, I mean we have max_cutoff stuff
| // With thread-MPI, each rank is a thread sharing the same process. | ||
| // PyTorch's internal OpenMP would spawn N threads per rank, causing | ||
| // massive oversubscription (e.g. 12 ranks × 12 OMP threads = 144 | ||
| // threads on 12 cores). Force single-threaded torch operations. |
There was a problem hiding this comment.
I thought we still oversubscribe even with this check?
There was a problem hiding this comment.
Sure but that's on the user, without it Pytorch separately tries to use the OMP variable
| if (fp) | ||
| { | ||
| data_->dtype = torch::kFloat32; | ||
| std::fprintf(fp, |
There was a problem hiding this comment.
Shouldn't this use the gromacs log mechanism instead of a pure fprint ?
There was a problem hiding this comment.
So the GROMACS logger only works on the main rank, the docs suggest not using iostreams
Use STL, but do not use iostreams outside of the unit tests. iostreams can have a negative impact on performance compared to other forms of string streams, depending on the use case. Also, they don’t always play well with using C stdio routines at the same time, which are used extensively in the current code. However, since Google tests rely on iostreams, you should use it in the unit test code.
So fprintf seemed like the best choice, it's also used all over the code.
There was a problem hiding this comment.
I like the timer a lot. However you said there is a GROMACS internal timer. The question is if we have to use this to get the code upstream. But maybe we think about this once we start with this step.
There was a problem hiding this comment.
Yeah, currently it is env var gated, but maybe they'd prefer us using theirs.
Look away until
vesin#3nnpotstyle implementation #1Or "real domain decomposition"..
Basically the LAMMPS style where the model loads everywhere, computes on every rank.
WIP. Needs testing to ensure consistency.All set. Closes #7.